Large dataset partitioning using ensemble partition-based clustering with majority voting technique
نویسندگان
چکیده
<span lang="EN-US">Large datasets have become useful in data mining for processing, storing, and handling vast amounts of data. However, processing large is time-consuming memory intensive. As a result, the researchers adopted partitioning strategy to improve controllability performance reduce time required handle datasets. Unfortunately, numerous clustering techniques available literature could confuse experts choosing best given dataset. Furthermore, no technique can tackle all problems, such as cluster structure, noise, or density. To manage datasets, existing need scalable solutions. Therefore, this paper proposes an ensemble partition-based with majority voting dataset using aggregation k-means, k-medoids, fuzzy c-means, expectation-maximization (EM) density-based spatial applications noise (DBSCAN) techniques. These individually first stage. The final clusters are discovered next stage through among five algorithms. algorithms assigned instances most votes. experimental findings demonstrate that method surpasses other terms execution accuracy.</span>
منابع مشابه
A partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملXML Document Partitioning using Ensemble Clustering
In this paper we propose a new technique for partitioning XML documents, in which conventional clustering techniques operating on flattened representations of individual aspects of the XML documents are combined to partition the available XML corpus. This offers the potential to divide the problem of catching content and structural regularities into simpler subproblems, in which only individual...
متن کاملPartition Selection Approach for Hierarchical Clustering Based on Clustering Ensemble
Hierarchical clustering algorithms are widely used in many fields of investigation. They provide a hierarchy of partitions of the same dataset. However, in many practical problems, the selection of a representative level (partition) in the hierarchy is needed. The classical approach to do so is by using a cluster validity index to select the best partition according to the criterion imposed by ...
متن کاملThe ensemble clustering with maximize diversity using evolutionary optimization algorithms
Data clustering is one of the main steps in data mining, which is responsible for exploring hidden patterns in non-tagged data. Due to the complexity of the problem and the weakness of the basic clustering methods, most studies today are guided by clustering ensemble methods. Diversity in primary results is one of the most important factors that can affect the quality of the final results. Also...
متن کاملA Review of Ensemble Technique for Improving Majority Voting for Classifier
Data classification plays important role in the field of data mining. The increasing rate of data diversity and size decrease the performance and efficiency of classifier. The decreasing performance of classifier compromised with unvoted data of classifier. Now the merging of two or more classifier for better prediction and voting of data are used, such techniques are called Ensemble classifier...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Indonesian Journal of Electrical Engineering and Computer Science
سال: 2023
ISSN: ['2502-4752', '2502-4760']
DOI: https://doi.org/10.11591/ijeecs.v29.i2.pp838-844